Wikipedia Mining for Triple Extraction Enhanced by Co-reference Resolution
نویسنده
چکیده
Since Wikipedia has become a huge scale database storing wide-range of human knowledge, it is a promising corpus for knowledge extraction. A considerable number of researches on Wikipedia mining have been conducted and the fact that Wikipedia is an invaluable corpus has been confirmed. Wikipedia’s impressive characteristics are not limited to the scale, but also include the dense link structure, URI for word sense disambiguation, well structured Infoboxes, and the category tree. In previous researches on this area, the category tree has been widely used to extract semantic relations among concepts on Wikipedia. In this paper, we try to extract triples (Subject, Predicate, Object) from Wikipedia articles, another promising resource for knowledge extraction. We propose a practical method which integrates link structure mining and parsing to enhance the extraction accuracy. The proposed method consists of two technical novelties; two parsing strategies and a co-reference resolution
منابع مشابه
(Co)Reference Resolution in Wikipedia
We present a coreference resolution system aimed at improving information extraction from Wikipedia. The proposed system is trained on a corpus of Wikipedia articles manually annotated with coreference and reference information. Experimental results demonstrate that highly accurate coreference decisions can be made for 75% of the input, and furthermore indicate that Wikipedia’s structure can be...
متن کاملBuilding a Corpus for Named Entity Recognition using Portuguese Wikipedia and DBpedia
Some natural language processing tasks can be learned from example corpora, but having enough examples for the task at hands can be a bottleneck. In this work we address how Wikipedia and DBpedia, two freely available language resources, can be used to support Named Entity Recognition, a fundamental task in Information Extraction and a necessary step of other tasks such as Co-reference Resoluti...
متن کاملWikipedia Mining Wikipedia as a Corpus for Knowledge Extraction
Wikipedia, a collaborative Wiki-based encyclopedia, has become a huge phenomenon among Internet users. It covers a huge number of concepts of various fields such as Arts, Geography, History, Science, Sports and Games. As a corpus for knowledge extraction, Wikipedia’s impressive characteristics are not limited to the scale, but also include the dense link structure, word sense disambiguation bas...
متن کاملA Learning Based Model for Chinese Co-reference Resolution by Mining Contextual Evidence
This paper presents a learning based model for Chinese co-reference resolution, in which diverse contextual features are explored inspired by related linguistic theory. Our main motivation is to try to boost the co-reference resolution performance only by leveraging multiple shallow syntactic and semantic features, which can escape from tough problems such as deep syntactic and semantic structu...
متن کاملWikipedia Link Structure and Text Mining for Semantic Relation Extraction
Wikipedia, a collaborative Wiki-based encyclopedia, has become a huge phenomenon among Internet users. It covers huge number of concepts of various fields such as Arts, Geography, History, Science, Sports and Games. Since it is becoming a database storing all human knowledge, Wikipedia mining is a promising approach that bridges the Semantic Web and the Social Web (a. k. a. Web 2.0). In fact, i...
متن کامل